Kazakh Segmentation System of Inflectional Affixes
نویسندگان
چکیده
This paper focuses on the automatic segmentation of inflectional affixes of the Kazakh Language (KL) on the basis of studying the corpus of KL. Kazakh is an agglutinative language with word structures formed by productive affixation of derivational and inflectional suffixes to stems. Based on the analysis of the configuration of inflectional affixes, it firstly constructs the Finite-State Automation and the segmentation of inflectional affixes. Secondly it targets at specially constructing the Finite-State Automations of nouns and verbs, which are the most changeable and complex part of speech of KL. And thirdly it adopts the methods of Bidirectional Omni-Word Segmentation and lexical analysis to achieve the goal of stemming and fine segmentation of inflectional affixes of KL. And finally it gives an additional account of studying the segmentation of ambiguous inflectional affixes. The paper intends to improve the accuracy and the quickness of stemming the inflectional affixes of KL.
منابع مشابه
Incremental Learning of Affix Segmentation
This paper presents a supervised machine learning approach to incrementally learn and segment affixes using generic background knowledge. We used Prolog script to split an affix from the Amharic word for further morphological analysis. Amharic, a Semitic language, has very complex inflectional and derivational verb morphology, with many possible prefixes and suffixes which are used to show vari...
متن کاملIdentification of Basic Phrases for Kazakh Language using Maximum Entropy Model
This paper proposes the definition, classification and structure of the Kazakh basic phrases, and sets up a framework for classifying them according to their syntactic functions. Meanwhile, the structure of the Kazakh basic phrases were analyzed; and the determination of the Kazakh basic phrases collocation and extraction of the Kazakh basic phrases based on rules were followed. The Maximum Ent...
متن کاملStem alternations and multiple exponence
In a canonical inflectional paradigm, inflectional affixes mark distinctions in morphosyntactic value, while the lexical stem remains invariant. But stems are known to alternate too, constituting a system of inflectional marking operating according to parameters which typically differ from those of the affixal system, and so represent a distinct object of inquiry. Cross-linguistically, we still...
متن کاملA Lexicalized Tree Adjoining Grammar for Thai
This paper describes an alternative formalism for Thai syntax parsing based on a lexicalized tree adjoining grammar (LTAG). We first briefly present some formal background concerning LTAG, which is necessary for an understanding of LTAG and its application to Thai. Specifically, we address several issues regarding difficulties in parsing Thai sentences and how to resolve these issues using LTAG...
متن کامل22 English Inflection and Derivation
Modern English approaches the ideal of an isolating language. Open-class items have comparatively few forms, so that many inflectional categories either remain unmarked, or are expressed periphrastically. The inflectional system is particularly simple, even by the standards of a West Germanic language. Regular paradigms contain at most four forms, and the inflectional exponents that distinguish...
متن کامل